Skip to main content

KKT

Karush-Kuhn-Tucker Condition

We consider the following problem:

 minimize f(x) subject to h(x)=0,g(x)0,\begin{aligned} \text { minimize } & f(\boldsymbol{x}) \\ \text { subject to } & \boldsymbol{h}(\boldsymbol{x})=\mathbf{0}, \\ & \boldsymbol{g}(\boldsymbol{x}) \leq \mathbf{0}, \end{aligned}

where f:RnR,h:RnRm,mnf: \mathbb{R}^n \rightarrow \mathbb{R}, \boldsymbol{h}: \mathbb{R}^n \rightarrow \mathbb{R}^m, m \leq n, and g:RnRp\boldsymbol{g}: \mathbb{R}^n \rightarrow \mathbb{R}^p. For the general problem above, we adopt the following definitions.

Definition

An inequality constraint gj(x)0g_j(\boldsymbol{x}) \leq 0 is said to be active at x\boldsymbol{x}^* if gj(x)=0g_j\left(\boldsymbol{x}^*\right)=0. It is inactive at x\boldsymbol{x}^* if gj(x)<0g_j\left(\boldsymbol{x}^*\right)<0.

By convention, we consider an equality constraint hi(x)=0h_i(\boldsymbol{x})=0 to be always active.

Definition

Let x\boldsymbol{x}^* satisfy h(x)=0,g(x)0\boldsymbol{h}\left(\boldsymbol{x}^*\right)=\mathbf{0}, \boldsymbol{g}\left(\boldsymbol{x}^*\right) \leq \mathbf{0}, and let J(x)J\left(\boldsymbol{x}^*\right) be the index set of active inequality constraints:

J(x){j:gj(x)=0}J\left(\boldsymbol{x}^*\right) \triangleq\left\{j: g_j\left(\boldsymbol{x}^*\right)=0\right\}

Then, we say that x\boldsymbol{x}^* is a regular point if the vectors

hi(x),gj(x),1im,jJ(x)\nabla h_i\left(\boldsymbol{x}^*\right), \quad \nabla g_j\left(\boldsymbol{x}^*\right), \quad 1 \leq i \leq m, \quad j \in J\left(\boldsymbol{x}^*\right)

are linearly independent. We now prove a first-order necessary condition for a point to be a local minimizer. We call this condition the Karush-Kuhn-Tucker (KKT)(K K T) condition. In the literature, this condition is sometimes also called the Kuhn-Tucker condition.

Karush-Kuhn-Tucker (KKT) Theorem

Let f, h,g\boldsymbol{h}, g \in C1\mathcal{C}^1. Let x\boldsymbol{x}^* be a regular point and a local minimizer for the problem of minimizing ff subject to h(x)=0,g(x)0\boldsymbol{h}(\boldsymbol{x})=\mathbf{0}, \boldsymbol{g}(\boldsymbol{x}) \leq \mathbf{0}. Then, there exist λRm\boldsymbol{\lambda}^* \in \mathbb{R}^m and μRp\boldsymbol{\mu}^* \in \mathbb{R}^p such that:

  1. μ0\boldsymbol{\mu}^* \geq \mathbf{0}
  2. Df(x)+λDh(x)+μDg(x)=0D f\left(\boldsymbol{x}^*\right)+\boldsymbol{\lambda}^{* \top} D \boldsymbol{h}\left(\boldsymbol{x}^*\right)+\boldsymbol{\mu}^{* \top} D \boldsymbol{g}\left(\boldsymbol{x}^*\right)=\mathbf{0}^{\top}.
  3. μg(x)=0\boldsymbol{\mu}^{* \top} \boldsymbol{g}\left(\boldsymbol{x}^*\right)=0. In this theorem, we refer to λ\boldsymbol{\lambda}^* as the Lagrange multiplier vector and μ\boldsymbol{\mu}^* as the Karush-Kuhn-Tucker (KKT) multiplier vector. We refer to their components as Lagrange multipliers and Karush-Kuhn-Tucker (KKT) multipliers, respectively.

Before proving this theorem, let us first discuss its meaning. Observe that μj0\mu_j^* \geq 0 (by condition 1 ) and gj(x)0g_j\left(\boldsymbol{x}^*\right) \leq 0. Therefore, the condition

μg(x)=μ1g1(x)++μpgp(x)=0\boldsymbol{\mu}^{* \top} \boldsymbol{g}\left(\boldsymbol{x}^*\right)=\mu_1^* g_1\left(\boldsymbol{x}^*\right)+\cdots+\mu_p^* g_p\left(\boldsymbol{x}^*\right)=0

implies that if gj(x)<0g_j\left(\boldsymbol{x}^*\right)<0, then μj=0\mu_j^*=0; that is, for all jJ(x)j \notin J\left(\boldsymbol{x}^*\right), we have μj=0\mu_j^*=0. In other words, the KKT multipliers μj\mu_j^* corresponding to inactive constraints are zero. The other KKT multipliers, μi,iJ(x)\mu_i^*, i \in J\left(\boldsymbol{x}^*\right), are nonnegative; they may or may not be equal to zero.

Proof

Let x\boldsymbol{x}^* be a regular local minimizer of ff on the set {x:h(x)=0,g(x)0}\{\boldsymbol{x}: \boldsymbol{h}(\boldsymbol{x})=\mathbf{0}, \boldsymbol{g}(\boldsymbol{x}) \leq \mathbf{0}\}. Then, x\boldsymbol{x}^* is also a regular local minimizer of ff on the set {x:h(x)=0,gj(x)=0,jJ(x)}\left\{\boldsymbol{x}: \boldsymbol{h}(\boldsymbol{x})=\mathbf{0}, g_j(\boldsymbol{x})=0, j \in J\left(\boldsymbol{x}^*\right)\right\} (see Exercise 21.16). Note that the latter constraint set involves only equality constraints. Therefore, from Lagrange's theorem, it follows that there exist vectors λRm\boldsymbol{\lambda}^* \in \mathbb{R}^m and μRp\boldsymbol{\mu}^* \in \mathbb{R}^p such that

Df(x)+λDh(x)+μDg(x)=0,D f\left(\boldsymbol{x}^*\right)+\boldsymbol{\lambda}^{* \top} D \boldsymbol{h}\left(\boldsymbol{x}^*\right)+\boldsymbol{\mu}^{* \top} D \boldsymbol{g}\left(\boldsymbol{x}^*\right)=\mathbf{0}^{\top},

where for all jJ(x)j \notin J\left(\boldsymbol{x}^*\right), we have μj=0\mu_j^*=0. To complete the proof it remains to show that for all jJ(x)j \in J\left(x^*\right), we have μj0\mu_j^* \geq 0 (and hence for all j=1,,pj=1, \ldots, p, we have μj0\mu_j^* \geq 0, i.e., μ0\mu^* \geq \mathbf{0} ). We use a proof by contradiction. So suppose that there exists jJ(x)j \in J\left(\boldsymbol{x}^*\right) such that μj<0\mu_j^*<0. Let S^\hat{S} and T^(x)\hat{T}\left(\boldsymbol{x}^*\right) be the surface and tangent space defined by all other active constraints at xx^*. Specifically,

S^={x:h(x)=0,gi(x)=0,iJ(x),ij}\hat{S}=\left\{\boldsymbol{x}: \boldsymbol{h}(\boldsymbol{x})=\mathbf{0}, g_i(\boldsymbol{x})=0, i \in J\left(\boldsymbol{x}^*\right), i \neq j\right\}

and

T^(x)={y:Dh(x)y=0,Dgi(x)y=0,iJ(x),ij}.\hat{T}\left(\boldsymbol{x}^*\right)=\left\{\boldsymbol{y}: D \boldsymbol{h}\left(\boldsymbol{x}^*\right) \boldsymbol{y}=\mathbf{0}, D g_i\left(\boldsymbol{x}^*\right) \boldsymbol{y}=0, i \in J\left(\boldsymbol{x}^*\right), i \neq j\right\} .

We claim that by the regularity of x\boldsymbol{x}^*, there exists yT^(x)\boldsymbol{y} \in \hat{T}\left(\boldsymbol{x}^*\right) such that

Dgj(x)y0.D g_j\left(\boldsymbol{x}^*\right) \boldsymbol{y} \neq 0 .

To see this, suppose that for all yT^(x),gj(x)y=Dgj(x)y=0\boldsymbol{y} \in \hat{T}\left(\boldsymbol{x}^*\right), \nabla g_j\left(\boldsymbol{x}^*\right)^{\top} \boldsymbol{y}=D g_j\left(\boldsymbol{x}^*\right) \boldsymbol{y}=0 This implies that gj(x)T^(x)\nabla g_j\left(\boldsymbol{x}^*\right) \in \hat{T}\left(\boldsymbol{x}^*\right)^{\perp}. By Lemma 20.1, this, in turn, implies that

gj(x)span[hk(x),k=1,,m,gi(x),iJ(x),ij]\nabla g_j\left(\boldsymbol{x}^*\right) \in \operatorname{span}\left[\nabla h_k\left(\boldsymbol{x}^*\right), k=1, \ldots, m, \nabla g_i\left(\boldsymbol{x}^*\right), i \in J\left(\boldsymbol{x}^*\right), i \neq j\right]

But this contradicts the fact that x\boldsymbol{x}^* is a regular point, which proves our claim. Without loss of generality, we assume that we have y\boldsymbol{y} such that Dgj(x)y<0D g_j\left(\boldsymbol{x}^*\right) \boldsymbol{y}<0 Consider the Lagrange condition, rewritten as

Df(x)+λDh(x)+μjDgj(x)+ijμiDgi(x)=0.D f\left(\boldsymbol{x}^*\right)+\boldsymbol{\lambda}^{* \top} D \boldsymbol{h}\left(\boldsymbol{x}^*\right)+\mu_j^* D g_j\left(\boldsymbol{x}^*\right)+\sum_{i \neq j} \mu_i^* D g_i\left(\boldsymbol{x}^*\right)=\mathbf{0}^{\top} .

If we postmultiply the above by y\boldsymbol{y} and use the fact that yT^(x)\boldsymbol{y} \in \hat{T}\left(\boldsymbol{x}^*\right), we get

Df(x)y=μjDgj(x)y.D f\left(\boldsymbol{x}^*\right) \boldsymbol{y}=-\mu_j^* D g_j\left(\boldsymbol{x}^*\right) \boldsymbol{y} .

Because Dgj(x)y<0D g_j\left(\boldsymbol{x}^*\right) \boldsymbol{y}<0 and we have assumed that μj<0\mu_j^*<0, we have

Df(x)y<0.D f\left(\boldsymbol{x}^*\right) \boldsymbol{y}<0 .

Because yT^(x)y \in \hat{T}\left(x^*\right), by Theorem 20.120.1 we can find a differentiable curve {x(t):t(a,b)}\{\boldsymbol{x}(t): t \in(a, b)\} on SS such that there exists t(a,b)t^* \in(a, b) with x(t)=x\boldsymbol{x}\left(t^*\right)=\boldsymbol{x}^* and x˙(t)=y\dot{\boldsymbol{x}}\left(t^*\right)=\boldsymbol{y}. Now,

ddtf(x(t))=Df(x)y<0.\frac{d}{d t} f\left(\boldsymbol{x}\left(t^*\right)\right)=D f\left(\boldsymbol{x}^*\right) \boldsymbol{y}<0 .

The above means that there is a δ>0\delta>0 such that for all t(t,t+δ]t \in\left(t^*, t^*+\delta\right], we have

f(x(t))<f(x(t))=f(x).f(\boldsymbol{x}(t))<f\left(\boldsymbol{x}\left(t^*\right)\right)=f\left(\boldsymbol{x}^*\right) .

On the other hand,

ddtgj(x(t))=Dgj(x)y<0,\frac{d}{d t} g_j\left(\boldsymbol{x}\left(t^*\right)\right)=D g_j\left(\boldsymbol{x}^*\right) \boldsymbol{y}<0,

and for some ε>0\varepsilon>0 and all t[t,t+ε]t \in\left[t^*, t^*+\varepsilon\right], we have that gj(x(t))0g_j(\boldsymbol{x}(t)) \leq 0. Therefore, for all t(t,t+min{δ,ε}]t \in\left(t^*, t^*+\min \{\delta, \varepsilon\}\right], we have that gj(x(t))0g_j(\boldsymbol{x}(t)) \leq 0 and f(x(t))<f(x)f(\boldsymbol{x}(t))<f\left(\boldsymbol{x}^*\right). Because the points x(t),t(t,t+min{δ,ε}]\boldsymbol{x}(t), t \in\left(t^*, t^*+\min \{\delta, \varepsilon\}\right], are in S^\hat{S}, they are feasible points with lower objective function values than xx^*. This contradicts the assumption that x\boldsymbol{x}^* is a local minimizer, which completes the proof.